Toward Corpus-Based Machine Translation for Standard Arabic
نویسنده
چکیده
The paper defines corpus-based machine translation and its possible applications in machine translation. The study is based on a bilingual corpus of French and Arabic texts and translation unit alignment. The criteria used for alignment combine linguistic and statistical information. The study also suggests procedures to build a machine translation system based on parallel translated corpora.
منابع مشابه
روشی جدید جهت استخراج موجودیتهای اسمی در عربی کلاسیک
In Natural Language Processing (NLP) studies, developing resources and tools makes a contribution to extension and effectiveness of researches in each language. In recent years, Arabic Named Entity Recognition (ANER) has been considered by NLP researchers due to a significant impact on improving other NLP tasks such as Machine translation, Information retrieval, question answering, query result...
متن کاملGrapheme to phoneme conversion: an Arabic dialect case
We aim to develop a Speech-to-Speech translation system between Modern Standard Arabic and Algiers dialect. Such a system must include a Text-to-Speech module which itself must include a Grapheme-to-Phoneme converter. Algiers dialect is an Arabic dialect concerned by the most problems of Modern Standard Arabic in NLP area. Furthermore, it could be considered as an under-resourced language becau...
متن کاملUsing Verb Paraphrases for Arabic-to-English Example-Based Translation
We have developed an experimental Arabic-to-English example-based machine translation (EBMT) system, which exploits a bilingual corpus to find examples that match fragments of the input source-language text--Modern Standard Arabic (MSA), in our case--and imitates its translations. Translation examples were extracted from a collection of parallel, sentencealigned, unvocalized Arabic-English docu...
متن کاملCorpus based coreference resolution for Farsi text
"Coreference resolution" or "finding all expressions that refer to the same entity" in a text, is one of the important requirements in natural language processing. Two words are coreference when both refer to a single entity in the text or the real world. So the main task of coreference resolution systems is to identify terms that refer to a unique entity. A coreference resolution tool could be...
متن کاملExploiting Out-of-Domain Data Sources for Dialectal Arabic Statistical Machine Translation
Statistical machine translation for dialectal Arabic is characterized by a lack of data since data acquisition involves the transcription and translation of spoken language. In this study we develop techniques for extracting parallel data for one particular dialect of Arabic (Iraqi Arabic) from out-ofdomain corpora in different dialects of Arabic or in Modern Standard Arabic. We compare two dif...
متن کامل